An analysis of temporal-difference learning with function approximation
نویسندگان
چکیده
منابع مشابه
An Analysis of Temporal-Difference Learning with Function Approximation
We discuss the temporal-difference learning algorithm, as applied to approximating the cost-to-go function of an infinite-horizon discounted Markov chain. The algorithm we analyze updates parameters of a linear function approximator online during a single endless trajectory of an irreducible aperiodic Markov chain with a finite or infinite state space. We present a proof of convergence (with pr...
متن کاملAn Analysis of Temporal Di erence Learning with Function Approximation
We discuss the temporal di erence learning algorithm as applied to approximating the cost to go function of an in nite horizon discounted Markov chain The algorithm we analyze updates parameters of a linear function approximator on line during a single endless traject ory of an irreducible aperiodic Markov chain with a nite or in nite state space We present a proof of convergence with probabili...
متن کاملMitigating Catastrophic Forgetting in Temporal Difference Learning with Function Approximation
Neural networks have had many great successes in recent years, particularly with the advent of deep learning and many novel training techniques. One issue that has prevented reinforcement learning from taking full advantage of scalable neural networks is that of catastrophic forgetting. The latter affects supervised learning systems when highly correlated input samples are presented, as well as...
متن کاملConvergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation
We introduce the first temporal-difference learning algorithms that converge with smooth value function approximators, such as neural networks. Conventional temporal-difference (TD) methods, such as TD(λ), Q-learning and Sarsa have been used successfully with function approximation in many applications. However, it is well known that off-policy sampling, as well as nonlinear function approximat...
متن کاملOff-Policy Temporal Difference Learning with Function Approximation
We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Offpolicy learning is of interest because it forms the basis for popular reinforcement learning methods such as Q-learning, which has been known to diverge with linear function approximation, and because it is critical to the practical utility of multi-scale, multi-goa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Automatic Control
سال: 1997
ISSN: 0018-9286
DOI: 10.1109/9.580874